40 research outputs found
Understanding the Complexity of Detecting Political Ads
Online political advertising has grown significantly over the last few years.
To monitor online sponsored political discourse, companies such as Facebook,
Google, and Twitter have created public Ad Libraries collecting the political
ads that run on their platforms. Currently, both policymakers and platforms are
debating further restrictions on political advertising to deter misuses.
This paper investigates whether we can reliably distinguish political ads
from non-political ads. We take an empirical approach to analyze what kind of
ads are deemed political by ordinary people and what kind of ads lead to
disagreement. Our results show a significant disagreement between what ad
platforms, ordinary people, and advertisers consider political and suggest that
this disagreement mainly comes from diverging opinions on which ads address
social issues. Overall our results imply that it is important to consider
social issue ads as political, but they also complicate political advertising
regulations.Comment: Proceedings of the Web Conference 2021 (WWW '21), April 19--23, 2021,
Ljubljana, Sloveni
Corrélation des profils d'utilisateurs dans les réseaux sociaux : méthodes et applications
The proliferation of social networks and all the personal data that people share brings many opportunities for developing exciting new applications. At the same time, however, the availability of vast amounts of personal data raises privacy and security concerns.In this thesis, we develop methods to identify the social networks accounts of a given user. We first study how we can exploit the public profiles users maintain in different social networks to match their accounts. We identify four important properties – Availability, Consistency, non- Impersonability, and Discriminability (ACID) – to evaluate the quality of different profile attributes to match accounts. Exploiting public profiles has a good potential to match accounts because a large number of users have the same names and other personal infor- mation across different social networks. Yet, it remains challenging to achieve practically useful accuracy of matching due to the scale of real social networks. To demonstrate that matching accounts in real social networks is feasible and reliable enough to be used in practice, we focus on designing matching schemes that achieve low error rates even when applied in large-scale networks with hundreds of millions of users. Then, we show that we can still match accounts across social networks even if we only exploit what users post, i.e., their activity on a social networks. This demonstrates that, even if users are privacy conscious and maintain distinct profiles on different social networks, we can still potentially match their accounts. Finally, we show that, by identifying accounts that correspond to the same person inside a social network, we can detect impersonators.La prolifération des réseaux sociaux et des données à caractère personnel apporte de nombreuses possibilités de développement de nouvelles applications. Au même temps, la disponibilité de grandes quantités de données à caractère personnel soulève des problèmes de confidentialité et de sécurité. Dans cette thèse, nous développons des méthodes pour identifier les différents comptes d'un utilisateur dans des réseaux sociaux. Nous étudions d'abord comment nous pouvons exploiter les profils publics maintenus par les utilisateurs pour corréler leurs comptes. Nous identifions quatre propriétés importantes - la disponibilité, la cohérence, la non-impersonabilite, et la discriminabilité (ACID) - pour évaluer la qualité de différents attributs pour corréler des comptes. On peut corréler un grand nombre de comptes parce-que les utilisateurs maintiennent les mêmes noms et d'autres informations personnelles à travers des différents réseaux sociaux. Pourtant, il reste difficile d'obtenir une précision suffisant pour utiliser les corrélations dans la pratique à cause de la grandeur de réseaux sociaux réels. Nous développons des schémas qui obtiennent des faible taux d'erreur même lorsqu'elles sont appliquées dans les réseaux avec des millions d'utilisateurs. Ensuite, nous montrons que nous pouvons corréler les comptes d'utilisateurs même si nous exploitons que leur activité sur un les réseaux sociaux. Ça sa démontre que, même si les utilisateurs maintient des profils distincts nous pouvons toutefois corréler leurs comptes. Enfin, nous montrons que, en identifiant les comptes qui correspondent à la même personne à l'intérieur d'un réseau social, nous pouvons détecter des imitateurs
Marketing to Children Through Online Targeted Advertising: Targeting Mechanisms and Legal Aspects
Many researchers and organizations, such as WHO and UNICEF, have raised
awareness of the dangers of advertisements targeted at children. While most
existing laws only regulate ads on television that may reach children,
lawmakers have been working on extending regulations to online advertising and,
for example, forbid (e.g., the DSA) or restrict (e.g., the COPPA) advertising
based on profiling to children. At first sight, ad platforms such as Google
seem to protect children by not allowing advertisers to target their ads to
users who are less than 18 years old. However, this paper shows that other
targeting features can be exploited to reach children. For example, on YouTube,
advertisers can target their ads to users watching a particular video through
placement-based targeting, a form of contextual targeting. Hence, advertisers
can target children by placing their ads in children-focused videos. Through a
series of ad experiments, we show that placement-based targeting is possible on
children-focused videos and enables marketing to children. In addition, our ad
experiments show that advertisers can use targeting based on profiling (e.g.,
interest, location, behavior) in combination with placement-based advertising
on children-focused videos. We discuss the lawfulness of these two practices
concerning DSA and COPPA. Finally, we investigate to which extent real-world
advertisers are employing placement-based targeting to reach children with ads
on YouTube. We propose a measurement methodology consisting of building a
Chrome extension to capture ads and instrument six browser profiles to watch
children-focused videos. Our results show that 7% of ads that appear in the
children-focused videos we test use placement-based targeting. Hence, targeting
children with ads on YouTube is not only hypothetically possible but also
occurs in practice..
Understanding the Privacy Risks of Popular Search Engine Advertising Systems
We present the first extensive measurement of the privacy properties of the
advertising systems used by privacy-focused search engines. We propose an
automated methodology to study the impact of clicking on search ads on three
popular private search engines which have advertising-based business models:
StartPage, Qwant, and DuckDuckGo, and we compare them to two dominant
data-harvesting ones: Google and Bing. We investigate the possibility of third
parties tracking users when clicking on ads by analyzing first-party storage,
redirection domain paths, and requests sent before, when, and after the clicks.
Our results show that privacy-focused search engines fail to protect users'
privacy when clicking ads. Users' requests are sent through redirectors on 4%
of ad clicks on Bing, 86% of ad clicks on Qwant, and 100% of ad clicks on
Google, DuckDuckGo, and StartPage. Even worse, advertising systems collude with
advertisers across all search engines by passing unique IDs to advertisers in
most ad clicks. These IDs allow redirectors to aggregate users' activity on
ads' destination websites in addition to the activity they record when users
are redirected through them. Overall, we observe that both privacy-focused and
traditional search engines engage in privacy-harming behaviors allowing
cross-site tracking, even in privacy-enhanced browsers
Exploring the Online Micro-targeting Practices of Small, Medium, and Large Businesses
Facebook and other advertising platforms exploit users data for marketing
purposes by allowing advertisers to select specific users and target them (the
practice is being called micro-targeting). However, advertisers such as
Cambridge Analytica have maliciously used these targeting features to
manipulate users in the context of elections. The European Commission plans to
restrict or ban some targeting functionalities in the new European Democracy
Action Plan act to protect users from such harms. The difficulty is that we do
not know the economic impact of these restrictions on regular advertisers. In
this paper, to inform the debate, we take a first step by understanding who is
advertising on Facebook and how they use the targeting functionalities. For
this, we asked 890 U.S. users to install a monitoring tool on their browsers to
collect the ads they receive on Facebook and information about how these ads
were targeted. By matching advertisers on Facebook with their LinkedIn
profiles, we could see that 71% of advertisers are small and medium-sized
businesses with 200 employees or less, and they are responsible for 61% of ads
and 57% of ad impressions. Regarding micro-targeting, we found that only 32% of
small and medium-sized businesses and 30% of large-sized businesses
micro-target at least one of their ads. These results should not be interpreted
as micro-targeting not being useful as a marketing strategy, but rather that
advertisers prefer to outsource the micro-targeting task to ad platforms.
Indeed, Facebook is employing optimization algorithms that exploit user data to
decide which users should see what ads; which means ad platforms are performing
an algorithmic-driven micro-targeting. Hence, when setting restrictions,
legislators should take into account both the traditional advertiser-driven
micro-targeting as well as algorithmic-driven micro-targeting performed by ad
platforms
On Detecting Policy-Related Political Ads: An Exploratory Analysis of Meta Ads in 2022 French Election
Online political advertising has become the cornerstone of political
campaigns. The budget spent solely on political advertising in the U.S. has
increased by more than 100% from \$700 million during the 2017-2018 U.S.
election cycle to \$1.6 billion during the 2020 U.S. presidential elections.
Naturally, the capacity offered by online platforms to micro-target ads with
political content has been worrying lawmakers, journalists, and online
platforms, especially after the 2016 U.S. presidential election, where
Cambridge Analytica has targeted voters with political ads congruent with their
personality
To curb such risks, both online platforms and regulators (through the DSA act
proposed by the European Commission) have agreed that researchers, journalists,
and civil society need to be able to scrutinize the political ads running on
large online platforms. Consequently, online platforms such as Meta and Google
have implemented Ad Libraries that contain information about all political ads
running on their platforms. This is the first step on a long path. Due to the
volume of available data, it is impossible to go through these ads manually,
and we now need automated methods and tools to assist in the scrutiny of
political ads.
In this paper, we focus on political ads that are related to policy.
Understanding which policies politicians or organizations promote and to whom
is essential in determining dishonest representations. This paper proposes
automated methods based on pre-trained models to classify ads in 14 main policy
groups identified by the Comparative Agenda Project (CAP). We discuss several
inherent challenges that arise. Finally, we analyze policy-related ads featured
on Meta platforms during the 2022 French presidential elections period.Comment: Proceedings of the ACM Web Conference 2023 (WWW '23), May 1--5, 2023,
Austin, TX, US
SLIM : Scalable Linkage of Mobility Data
We present a scalable solution to link entities across mobility datasets using their spatio-temporal information. This is a fundamental problem in many applications such as linking user identities for security, understanding privacy limitations of location based services, or producing a unified dataset from multiple sources for urban planning. Such integrated datasets are also essential for service providers to optimise their services and improve business intelligence. In this paper, we first propose a mobility based representation and similarity computation for entities. An efficient matching process is then developed to identify the final linked pairs, with an automated mechanism to decide when to stop the linkage. We scale the process with a locality-sensitive hashing (LSH) based approach that significantly reduces candidate pairs for matching. To realize the effectiveness and efficiency of our techniques in practice, we introduce an algorithm called SLIM. In the experimental evaluation, SLIM outperforms the two existing state-of-the-art approaches in terms of precision and recall. Moreover, the LSH-based approach brings two to four orders of magnitude speedup
Collaborative Ad Transparency: Promises and Limitations
International audienceSeveral targeted advertising platforms offer transparency mechanisms, but researchers and civil societies repeatedly showed that those have major limitations. In this paper, we propose a collaborative ad transparency method to infer, without the cooperation of ad platforms, the targeting parameters used by advertisers to target their ads. Our idea is to ask users to donate data about their attributes and the ads they receive and to use this data to infer the targeting attributes of an ad campaign. We propose a Maximum Likelihood Estimator based on a simplified Bernoulli ad delivery model. We first test our inference method through controlled ad experiments on Facebook. Then, to further investigate the potential and limitations of collaborative ad transparency, we propose a simulation framework that allows varying key parameters. We validate that our framework gives accuracies consistent with real-world observations such that the insights from our simulations are transferable to the real world. We then perform an extensive simulation study for ad campaigns that target a combination of two attributes. Our results show that we can obtain good accuracy whenever at least ten monitored users receive an ad. This usually requires a few thousand monitored users, regardless of population size. Our simulation framework is based on a new method to generate a synthetic population with statistical properties resembling the actual population, which may be of independent interest
Characterizing end-host application performance across multiple networking environments
International audienceUsers today connect to the Internet everywhere - from home, work, airports, friend's homes, and more. This paper characterizes how the performance of networked applications varies across networking environments. Using data from a few dozen end-hosts, we compare the distributions of RTTs and download rates across pairs of environments. We illustrate that for most users the performance difference is statistically significant. We contrast the influence of the application mix and environmental factors on these performance differences
Exploiting Innocuous Activity for Correlating Users Across Sites
International audienceWe study how potential attackers can identify accounts on different social network sites that all belong to the same user, exploiting only innocuous activity that inherently comes with posted content. We examine three specific features on Yelp, Flickr, and Twitter: the geo-location attached to a user's posts, the timestamp of posts, and the user's writing style as captured by language models. We show that among these three features the location of posts is the most powerful feature to identify accounts that belong to the same user in different sites. When we combine all three features, the accuracy of identifying Twitter accounts that belong to a set of Flickr users is comparable to that of existing attacks that exploit usernames. Our attack can identify 37% more accounts than using usernames when we instead correlate Yelp and Twitter. Our results have significant privacy implications as they present a novel class of attacks that exploit users' tendency to assume that, if they maintain different personas with different names, the accounts cannot be linked together; whereas we show that the posts themselves can provide enough information to correlate the accounts